-
Notifications
You must be signed in to change notification settings - Fork 549
chore(types): Type-clean llm/ (27 errors) #1394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Converting to draft while I rebase on the latest changes to develop. |
eb83d1b
to
bb67063
Compare
…a_ai_endpoints for Github CI tests
…points work for pyright type-checking
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
) | ||
|
||
llm_result = self._generate( | ||
llm_result = getattr(self, "_generate")( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think getattr does just fool Pyright to ignoring the error and it is better to directly ignore it
llm_result = self._generate( #type: ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for all getattr usage below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better exclude nemoguardrails/llm/providers/_langchain_nvidia_ai_endpoints_patch.py from pyright checking. This file is a runtime patch for an optional dependency, and type-checking it provides minimal value compared to the cost of installing all extras in CI.
can we exclude it in pyproject.toml?
exclude = [
"nemoguardrails/llm/providers/trtllm/**",
"nemoguardrails/llm/providers/_langchain_nvidia_ai_endpoints_patch.py"
]
then we should restore the original state of workflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we revert trtllm related changes as it is excluded from type checking?
"tests/test_callbacks.py", | ||
] | ||
|
||
# tritonclient is only supported for Python <= 3.8, imports fail pyright-checking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it has nothing to do with Python version compatibility tritonclient
is not in dependencies (optional external package), imports fail pyright-checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module is not used anymore, I should have deprecated it as part of #1387 . Sorry for the pain
""" | ||
if hasattr(llm_instance, "model_kwargs"): | ||
return llm_instance.model_kwargs | ||
return getattr(llm_instance, "model_kwargs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How Pyright doesn't understand this?
|
||
try: | ||
model_name = llm_config.model | ||
model_name = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in rails/llm/config.py
@model_validator(mode="before")
@classmethod
def set_and_validate_model(cls, data: Any) -> Any:
if isinstance(data, dict):
parameters = data.get("parameters")
if parameters is None:
return data
model_field = data.get("model")
model_from_params = parameters.get("model_name") or parameters.get("model")
if model_field and model_from_params:
raise ValueError(
"Model name must be specified in exactly one place: either in the 'model' field or in parameters, not both."
)
if not model_field and model_from_params:
data["model"] = model_from_params
if (
"model_name" in parameters
and parameters["model_name"] == model_from_params
):
parameters.pop("model_name")
elif "model" in parameters and parameters["model"] == model_from_params:
parameters.pop("model")
return data
ensures that llm_config always have a model field. If pyright does not get this and it gets resolved using #type: ignore then we should ignore it.
) | ||
|
||
|
||
# later we can easily conver it to a class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# later we can easily conver it to a class | |
# later we can easily convert it to a class |
TL;DR
This type-cleaning PR needed a couple of changes I'd like to get feedback on (especially from @Pouyanpi ).
langchain_nvidia_ai_endpoints
package innemoguardrails/llm/providers/_langchain_nvidia_ai_endpoints_patch.py
. To type-check against this we need to install thelangchain-nvidia-ai-endpoints
package. I did this by modifyingpr-tests.yml
andtest-coverage-report.yml
to add the--all-extras
option topoetry install
. Any concerns with adding in this amount of extra packages?nemoguardrails/llm/providers/trtllm/client.py
. The latest version of Python supported by tritonclient is 3.8 which we deprecated on the last release. The code isn't referred to anywhere else. Should we deprecate this? I added it to the pyright exclude list while we decide what to do.Type-Safety Fixes Summary
This report summarizes the code changes made to improve type safety. The fixes have been categorized into high, medium, and low-risk buckets based on their potential to cause regressions or unintended behavior.
🔴 High-Risk Changes
This category includes changes that alter a function's core contract, making it a potentially breaking change for consumers of the API.
Stricter Function Argument Type
This fix tightens a function's API by changing an optional argument to a required one. This is high-risk because any calling code that previously passed
None
will now fail.nemoguardrails/llm/models/initializer.py
27
model_name
wasOptional[str]
, but the function's logic implicitly required a string value, creating a risk of aNone
value causing a downstream crash.model_name
was changed fromOptional[str]
tostr
, making it a required argument. This enforces the contract at the type-checking level and makes the function's expectations explicit.model_name
should never beNone
and that all call sites can provide a valid string. This assumption is supported by related changes innemoguardrails/rails/llm/llmrails.py
which now ensure a valid model name is always passed.Optional[str]
and raise aValueError
inside the function ifmodel_name
isNone
. The implemented fix is better because it fails earlier (at static analysis or integration time) rather than at runtime.🟡 Medium-Risk Changes
These changes correct significant errors or modify function signatures in a way that is likely correct but could affect downstream code that depended on the previous (incorrect) behavior.
1. Corrected Function Return Type Annotation
The function's implementation returned a different type than what its signature declared.
nemoguardrails/llm/filters.py
278
str
but was actually returning aList[dict]
.str
toList[dict]
to match the actual output of the function. This resolves the inconsistency and provides the correct type information to static analyzers and developers.List[dict]
and that no code was relying on the incorrectstr
annotation.2. Modified Function Signature to Match Superclass
The method signature in a subclass did not match the signature of the method it was overriding in the parent class.
nemoguardrails/llm/helpers.py
77-80
and88-91
_call
and_acall
methods in theWrapperLLM
class were missing the**kwargs
parameter present in the baseLLM
class from LangChain, causing a signature mismatch error (Liskov Substitution Principle violation).**kwargs
was added to the method signatures and passed through to the wrappedllm_instance
call. This makes the wrapper compliant with the base class interface, allowing it to accept and forward any additional keyword arguments.llm_instance
also accepts**kwargs
in its_call
and_acall
methods, which is a safe assumption for LangChain LLM objects.3. Added Control Flow for Optional Values in Core Logic
A loop condition involved comparing a value with a potentially
None
attribute, which would cause aTypeError
at runtime.nemoguardrails/llm/taskmanager.py
268-269
and294-296
len(task_prompt)
withprompt.max_length
inside awhile
loop without checking ifprompt.max_length
wasNone
.prompt.max_length is not None
was added to the loop condition. This ensures the length comparison only happens when a maximum length is actually defined, preventing aTypeError
.prompt.max_length
isNone
, no length constraint should be applied. This is the standard and logical interpretation of such an optional parameter.max_length
if it'sNone
, but the implemented guard condition is more explicit and readable.🟢 Low-Risk Changes
These changes are additive or fix minor issues with minimal to no risk of causing regressions. They include adding type hints,
None
checks in non-critical paths, and improving CI configurations.1. Added Type Hints to Local and Instance Variables
Variables were initialized without explicit types, reducing code clarity and hindering static analysis.
nemoguardrails/llm/filters.py
,nemoguardrails/llm/params.py
,nemoguardrails/llm/providers/huggingface/streamers.py
2. Improved Runtime Safety with
None
Checks (Guard Clauses)Code that accessed or iterated over potentially
None
values lacked proper checks, creating a risk of runtime errors.nemoguardrails/llm/taskmanager.py
,nemoguardrails/rails/llm/llmrails.py
None
before attempting to use an object. This is a defensive programming practice that makes the code more robust againstTypeError
orAttributeError
exceptions.3. Defensive Attribute Access with
getattr
Direct attribute access (
obj.attr
) was used on objects where the attribute might not exist, risking anAttributeError
.nemoguardrails/llm/helpers.py
,nemoguardrails/llm/params.py
,nemoguardrails/llm/providers/huggingface/pipeline.py
getattr(obj, "attr", default_value)
. This safely retrieves an attribute, providing a default value if it doesn't exist, thus preventing crashes and making the code more resilient.4. Graceful Handling of Optional Dependencies
The code would crash with an
ImportError
if optional dependencies liketritonclient
ortransformers
were not installed, even if their functionality wasn't being used.nemoguardrails/llm/providers/trtllm/client.py
,nemoguardrails/llm/providers/huggingface/pipeline.py
try...except ImportError
blocks to set a boolean flag (e.g.,TRITONCLIENT_AVAILABLE
). Type hints are handled usingif TYPE_CHECKING:
blocks andAny
fallbacks. At runtime, anImportError
is raised only when the specific functionality is actually invoked. This makes the library's import structure more robust.5. Enhanced CI/CD and Static Analysis Configuration
The continuous integration and static analysis configurations were improved for better test coverage and dependency management.
.github/workflows/*
,pyproject.toml
--all-extras
), ensuring that optional features are also tested.pyproject.toml
file was updated to exclude thetrtllm
provider frompyright
checks, which is a pragmatic solution given thattritonclient
can be difficult to type-check correctly. This prevents the CI pipeline from failing on type errors related to an optional, platform-specific dependency.Test Plan
Type-checking
$ poetry run pre-commit run --all-files check yaml...............................................................Passed fix end of files.........................................................Passed trim trailing whitespace.................................................Passed isort (python)...........................................................Passed black....................................................................Passed Insert license in comments...............................................Passed pyright..................................................................Passed
Unit-tests
Local CLI check
Related Issue(s)
Top-level PR to merge into before develop-branch merge: #1367
Checklist